Overview

Introduction


We explore the effect of COVID-19 pandemic on crime rates across different cities in the US to get a glimpse into the future of crime in these unprecedented circumstances.

Our goal is to use COVID and Crime data to make future predictions on crime rates in this pandemic. We also look at the crime rates before and after the pandemic to gain insight into how quarantine and self-isolation measures influence human behaviour leading to an increase in specific types of crime.

Click on the location markers on the left to view their analysis !

Key Results


Some text desccription

Coming Soon


Some text desccription

Chicago

Row

Impulse response function (criminal damage)

Impulse response function (vehicle theft)

Row

Chicago Crime Summary

VAR forecast for criminal damage

VAR forecast for vehicle theft

VAR forecast for assault

VAR forecast for battery

VAR forecast for deceptive

VAR forecast for theft

Row

Daily confirmed cases of COVID19 in Chicago

Summary

This is text link

  • one
  • two
  • three

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Providence

Boston

Philadelphia

Los Angeles

Atlanta

Seattle

---
title: "COVID-19 and US Crime"
author: ""
output: 
  flexdashboard::flex_dashboard:
    theme: flatly
    orientation: rows
    social: menu
    source: embed
---

```{r setup, include=FALSE}
library(ggplot2)
library(plotly)
library(plyr)
library(flexdashboard)
library(RSocrata)
library(tidyverse)
library(tsbox) # transform data into time series
library(xts)
library(COVID19) # to get data about covid 19
library(forecast) #ariRI model
library(vars) #VAR and Causality
library(dygraphs)
library(leaflet)
library(htmlwidgets)
#[INCOMPLETE] Graphs in Chicago have been converted here to Plotly HTML Widgets 
# Make some noisily increasing data [Testing Purposes]
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=10),
                  xvar = 1:20 + rnorm(20,sd=3),
                  yvar = 1:20 + rnorm(20,sd=3))

#Load Chicago Data
covid19_CH <- covid19("USA", level = 3) %>%
  # this cook county contains chicago
  filter(administrative_area_level_3 == "Cook",
         administrative_area_level_2 == "Illinois" ) %>%
  # filter out days when confirmed is zero or one
  # becasue when it was 2 for a very long time
  filter(confirmed > 2)

```

Overview {.storyboard}
=======================================================================

### Introduction

```{r}

Location <- c("Providence ","Los Angeles","Chicago","Boston","Seattle","Atlanta","Philadelphia" )
lat <- c(41.8240,33.82377,41.78798,42.361145,47.714965,33.753746, 39.952583)
lng <- c( -71.4128,-118.2668,-87.7738,-71.057083,-122.127166 ,-84.386330,-75.165222)
df <- data.frame(Location, lat,lng)
 jsCode <- paste0('
 function(el, x) {
  var marker = document.getElementsByClassName("leaflet-marker-icon leaflet-zoom-animated leaflet-interactive");
  marker[0].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#providence");};
  marker[1].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#los-angeles");};
  marker[2].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#chicago");};
  marker[3].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#boston");};
  marker[4].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#seattle");};
  marker[5].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#atlanta");};
  marker[6].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#philadelphia");};
}
 ')
m <- leaflet() %>%
  addTiles() %>%  # Add default OpenStreetMap map tiles
  addMarkers(data=df,)%>%
   onRender(jsCode)
m  # Print the map
```

***

We explore the effect of COVID-19 pandemic on crime rates across different cities in the US to get a glimpse into the future of crime in these unprecedented circumstances. 

Our goal is to use COVID and Crime data to make future predictions on crime rates in this pandemic. We also look at the crime rates before and after the pandemic to gain insight into how quarantine and self-isolation measures influence human behaviour leading to an increase in specific types of crime.

Click on the location markers on the left to view their analysis !

### Key Results

```{r}
```

***

Some text desccription

### Coming Soon

```{r}
```

***

Some text desccription


  
Chicago
=======================================================================





Row{data-height=300}
-------------------------------------
   

### Impulse response function (criminal damage)
```{r get the data for this city}
if (exists("chicago")) is.data.frame(get("chicago")) else chicago <- RSocrata::read.socrata(
  "https://data.cityofchicago.org/resource/ijzp-q8t2.csv?$where=year >= 2014",
  app_token = "hPU78MH7zKApdpUv4OVCInPOQ")
```

```{r}
# add date
chicago <- chicago %>%
  mutate(Date = substr(date, start = 1, stop = 10)) %>%
  mutate(y_month  = substr(date, start = 1, stop = 7)) %>%
  mutate(month = substr(date, start = 6, stop = 7))

# summary of all crime
chicago_summary <- chicago %>%
  group_by(primary_type) %>%
  summarise(number_of_crime = n()) %>%
  arrange(desc(number_of_crime))
```

```{r extract cases}
# extract top 5 crime
top5crime <- chicago %>%
  filter(primary_type %in% head(chicago_summary$primary_type, 5)) %>%
  group_by(Date, primary_type) %>%
  tally() %>%
  spread(primary_type, n)

# rename columns
colnames(top5crime) <- c('time',
                         "assault",
                         "battery",
                         "criminal",
                         'deceptive',
                         "theft")
top5crime <- na.omit(top5crime)
# create time series
top5crime_xts <- ts_xts(top5crime[,1:2])

for (i in (3:ncol(top5crime))){
  temp_xts <- ts_xts(top5crime[, c(1,i)])
  top5crime_xts <- merge(top5crime_xts, temp_xts)
}

# extract difference, change per day
top5crime_diff <- na.omit(diff(top5crime_xts))
```

```{r covid 19 related exploration, message=F, warning=F}
# extract for tranforming into time series data
ts_CH <- covid19_CH %>% 
  dplyr::select(date, confirmed) %>%
  ts_xts()

# try first log difference
ts_diff_CH <- na.omit(diff(ts_CH))

covid19_CH_diff <- data.frame(diff(covid19_CH$confirmed))
colnames(covid19_CH_diff)[1] = "confirmed"
covid19_CH_diff$date = covid19_CH$date[2:length(covid19_CH$date)]

# time as integer
covid19_CH_diff$timeInt = as.numeric(covid19_CH_diff$date)
# make a copy to avoid perfect collinearity
covid19_CH_diff$timeIid = covid19_CH_diff$timeInt

# GAMM model
# 50 too overfit. 15 looks decent
gamCH <- gamm4::gamm4(confirmed ~  s(timeInt, k=50), random = ~(1|timeIid), 
	data=covid19_CH_diff, family=poisson(link='log'))

toPredict = data.frame(time = seq(covid19_CH_diff$date[1], 
                                          covid19_CH_diff$date[length(covid19_CH_diff$date)],
                                  by = '1 day'))
toPredict$timeInt = as.numeric(toPredict$time)
# obtain forecast
forecast <- data.frame(exp(do.call(cbind, mgcv::predict.gam(gamCH$gam, toPredict, se.fit=TRUE))))

# access residuals
CH_res <- data.frame(covid19_CH_diff$confirmed - forecast$fit)

# transform into time series
CH_res$time = covid19_CH_diff$date
colnames(CH_res)[1] = "residuals"

col_order <- c("time", "residuals")
CH_res <- CH_res[, col_order]

CH_res_ts <- ts_xts(CH_res)
```

```{r top 5 crime VAR}
# specify common time range
# start from when covid was a thing
# end on May 25, to avoid effect of protests related to George Floyid.
common_time <- seq.Date(start(CH_res_ts), as.Date("2020-05-25"), by = "day")

# combine time series of crime and covid
combined_diff <- merge(top5crime_diff[paste(common_time[1],
                                            common_time[length(common_time)],
                                            sep = "/")],
                       CH_res_ts[paste(common_time[1],
                                            common_time[length(common_time)],
                                            sep = "/")])

```

```{r construct var}
optimal_assault <- VARselect(na.omit(combined_diff)[,c(1,6)], type = 'none', lag.max = 10)
optimal_battery <- VARselect(na.omit(combined_diff)[,c(2,6)], type = 'none', lag.max = 10)
optimal_criminal <- VARselect(na.omit(combined_diff)[,c(3,6)], type = 'none', lag.max = 10)
optimal_deceptive <- VARselect(na.omit(combined_diff)[,c(4,6)], type = 'none', lag.max = 10)
optimal_theft <- VARselect(na.omit(combined_diff)[,c(5,6)], type = 'none', lag.max = 10)

VAR_assault <- VAR(y=as.ts(na.omit(combined_diff)[,c(1,6)]), p=optimal_assault$selection[1])
VAR_battery <- VAR(y=as.ts(na.omit(combined_diff)[,c(2,6)]), p=optimal_battery$selection[1])
VAR_criminal <- VAR(y=as.ts(na.omit(combined_diff)[,c(3,6)]),
                    p=optimal_criminal$selection[1])
VAR_deceptive <- VAR(y=as.ts(na.omit(combined_diff)[,c(4,6)]),
                     p=optimal_deceptive$selection[1])
VAR_theft <- VAR(y=as.ts(na.omit(combined_diff)[,c(5,6)]), p=optimal_theft$selection[1])
```

```{r, warning=F, message=F}
vehicle <- chicago %>%
  filter(primary_type == 'MOTOR VEHICLE THEFT')%>%
  group_by(Date, primary_type) %>%
  tally() %>%
  spread(primary_type, n)

colnames(vehicle) <- c('time', 'vehicle')
vehicle_xts <- ts_xts(na.omit(vehicle)[,1:2])
vehicle_diff <- na.omit(diff(vehicle_xts))

combined_diff2 <- merge(vehicle_diff[paste(common_time[1],
                                            common_time[length(common_time)],
                                            sep = "/")],
                       CH_res_ts[paste(common_time[1],
                                            common_time[length(common_time)],
                                            sep = "/")])
optimal_vehicle <- VARselect(na.omit(combined_diff2)[,c(1,2)], type = 'none', lag.max = 10)
VAR_vehicle <- VAR(y=as.ts(na.omit(combined_diff2)[,c(1,2)]), p=optimal_vehicle$selection[1])
```

```{r irf}
# use car theft and criminal damage
par(mfrow = c(1,2))
lags = c(1:25)

# criminal damange
# only significant one is from covid to crime
irf_criminal_2 <- irf(VAR_criminal, 
                         impulse = "residuals", 
                         response = "criminal", 
                         n.ahead = 24,
                         ortho = F)


# ggplot version of it.
irf_criminal_2_gg <- data.frame(
  irf_criminal_2$irf$residuals[,1],
  irf_criminal_2$Lower$residuals[,1],
  irf_criminal_2$Upper$residuals[,1]
)

colnames(irf_criminal_2_gg) <- c("mean", "lower", "upper")

i1 <- ggplot(irf_criminal_2_gg, aes(x = lags)) +
  geom_line(aes(y = mean), color = "black") +
  geom_hline(yintercept = 0, color = "blue") +
  geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
  theme_classic() +
  ggtitle("How many more criminal damage cases per day there will be
          after 1 confirmed COVID-19 case") +
  xlab("Number of days after 1 COVID-19 case")+
  ylab("Number of criminal damange per day")

ggplotly(i1)
```

### Impulse response function (vehicle theft)
```{r}
# vehicle theft
# only significant one is from covid to crime
irf_vehicle_2 <- irf(VAR_vehicle,
                     impulse = "residuals",
                     response = "vehicle",
                     n.ahead = 24)


# ggplot version of it
irf_vehicle_2_gg <- data.frame(
  irf_vehicle_2$irf$residuals[,1],
  irf_vehicle_2$Lower$residuals[,1],
  irf_vehicle_2$Upper$residuals[,1]
)

colnames(irf_vehicle_2_gg) <- c("mean", "lower", "upper")

i2 <- ggplot(irf_vehicle_2_gg, aes(x = lags)) +
  geom_line(aes(y = mean), color = "black") +
  geom_hline(yintercept = 0, color = "blue") +
  geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
  theme_classic() +
  ggtitle("How many more vehicle theft cases per day there will be
          after 1 confirmed COVID-19 case") +
  xlab("Number of days after 1 COVID-19 case")+
  ylab("Number of vehicle theft per day")

ggplotly(i2)
```





Row {data-height=400 .tabset .tabset-fade }
-------------------------------------
   
### Chicago Crime Summary

```{r}
# add date
chicago <- chicago %>%
  mutate(Date = substr(date, start = 1, stop = 10)) %>%
  mutate(y_month  = substr(date, start = 1, stop = 7)) %>%
  mutate(month = substr(date, start = 6, stop = 7))

# summary of all crime
chicago_summary <- chicago %>%
  group_by(primary_type) %>%
  summarise(number_of_crime = n()) %>%
  arrange(desc(number_of_crime))

# looks life theft is seeing sharp drop

# year to year comparison
plt <- chicago %>%
  dplyr::select(y_month, month, primary_type, year) %>%
  filter(primary_type %in% chicago_summary$primary_type[1:5], y_month != "2020-06") %>%
  count(year, month, primary_type) %>%
  na.omit()%>% ggplot(aes(x=month, y=n, group = year, color = as.character(year))) + geom_line() + facet_free(~primary_type,scales = "free", space = "free")+xlab("Month") +
           ylab("Cases") + theme(legend.title = element_blank())

ggplotly(plt)
```   
 
### VAR forecast for criminal damage
```{r}
interval_value_formatter <- "function(num, opts, seriesName, g, row, col) {
  value = g.getValue(row, col);
  if(value[0] != value[2]) {
    lower = Dygraph.numberValueFormatter(value[0], opts);
    upper = Dygraph.numberValueFormatter(value[2], opts);
    return '[' + lower + ', ' + upper + ']';
  } else {
    return Dygraph.numberValueFormatter(num, opts);
  }
}"
```

```{r}
f_criminal <- forecast(VAR_criminal)
f_criminal$forecast$criminal %>% 
  {cbind(actuals=.$x, forecast_mean=.$mean,
         lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
         lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
  dygraph(main = 'Daily VAR forcast for assault in Chicago', 
          ylab = 'Daily (expected) cases') %>%
  dyAxis("y", valueFormatter = interval_value_formatter) %>%
  dySeries("actuals", color = "black") %>%
  dySeries("forecast_mean", color = "blue", label = "forecast") %>%
  dySeries(c("lower_80", "forecast_mean", "upper_80"),
           label = "80%", color = "blue") %>%
  dySeries(c("lower_95", "forecast_mean", "upper_95"),
           label = "95%", color = "blue")%>%
  dyLegend(labelsSeparateLines=TRUE) %>%
  dyRangeSelector() %>%
  dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```

### VAR forecast for vehicle theft
```{r}
f_vehicle <- forecast(VAR_vehicle)
f_vehicle$forecast$vehicle %>% 
  {cbind(actuals=.$x, forecast_mean=.$mean,
         lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
         lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
  dygraph(main = 'Daily VAR forcast for assault in Chicago', 
          ylab = 'Daily (expected) cases') %>%
  dyAxis("y", valueFormatter = interval_value_formatter) %>%
  dySeries("actuals", color = "black") %>%
  dySeries("forecast_mean", color = "blue", label = "forecast") %>%
  dySeries(c("lower_80", "forecast_mean", "upper_80"),
           label = "80%", color = "blue") %>%
  dySeries(c("lower_95", "forecast_mean", "upper_95"),
           label = "95%", color = "blue")%>%
  dyLegend(labelsSeparateLines=TRUE) %>%
  dyRangeSelector() %>%
  dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```

### VAR forecast for assault
```{r}
f_assault <- forecast(VAR_assault)
f_assault$forecast$assault %>% 
  {cbind(actuals=.$x, forecast_mean=.$mean,
         lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
         lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
  dygraph(main = 'Daily VAR forcast for assault in Chicago', 
          ylab = 'Daily (expected) cases') %>%
  dyAxis("y", valueFormatter = interval_value_formatter) %>%
  dySeries("actuals", color = "black") %>%
  dySeries("forecast_mean", color = "blue", label = "forecast") %>%
  dySeries(c("lower_80", "forecast_mean", "upper_80"),
           label = "80%", color = "blue") %>%
  dySeries(c("lower_95", "forecast_mean", "upper_95"),
           label = "95%", color = "blue")%>%
  dyLegend(labelsSeparateLines=TRUE) %>%
  dyRangeSelector() %>%
  dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```

### VAR forecast for battery
```{r}
f_battery <- forecast(VAR_battery)
f_battery$forecast$battery %>% 
  {cbind(actuals=.$x, forecast_mean=.$mean,
         lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
         lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
  dygraph(main = 'Daily VAR forcast for battery in Chicago', 
          ylab = 'Daily (expected) cases') %>%
  dyAxis("y", valueFormatter = interval_value_formatter) %>%
  dySeries("actuals", color = "black") %>%
  dySeries("forecast_mean", color = "blue", label = "forecast") %>%
  dySeries(c("lower_80", "forecast_mean", "upper_80"),
           label = "80%", color = "blue") %>%
  dySeries(c("lower_95", "forecast_mean", "upper_95"),
           label = "95%", color = "blue")%>%
  dyLegend(labelsSeparateLines=TRUE) %>%
  dyRangeSelector() %>%
  dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```

### VAR forecast for deceptive
```{r}
f_deceptive <- forecast(VAR_deceptive)
f_deceptive$forecast$deceptive %>% 
  {cbind(actuals=.$x, forecast_mean=.$mean,
         lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
         lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
  dygraph(main = 'Daily VAR forcast for deceptive practice in Chicago', 
          ylab = 'Daily (expected) cases') %>%
  dyAxis("y", valueFormatter = interval_value_formatter) %>%
  dySeries("actuals", color = "black") %>%
  dySeries("forecast_mean", color = "blue", label = "forecast") %>%
  dySeries(c("lower_80", "forecast_mean", "upper_80"),
           label = "80%", color = "blue") %>%
  dySeries(c("lower_95", "forecast_mean", "upper_95"),
           label = "95%", color = "blue")%>%
  dyLegend(labelsSeparateLines=TRUE) %>%
  dyRangeSelector() %>%
  dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```

### VAR forecast for theft
```{r}
f_theft <- forecast(VAR_theft)
f_theft$forecast$theft %>% 
  {cbind(actuals=.$x, forecast_mean=.$mean,
         lower_95=.$lower[,"95%"], upper_95=.$upper[,"95%"],
         lower_80=.$lower[,"80%"], upper_80=.$upper[,"80%"])} %>%
  dygraph(main = 'Daily VAR forcast for theft in Chicago', ylab = 'Daily (expected) cases') %>%
  dyAxis("y", valueFormatter = interval_value_formatter) %>%
  dySeries("actuals", color = "black") %>%
  dySeries("forecast_mean", color = "blue", label = "forecast") %>%
  dySeries(c("lower_80", "forecast_mean", "upper_80"),
           label = "80%", color = "blue") %>%
  dySeries(c("lower_95", "forecast_mean", "upper_95"),
           label = "95%", color = "blue")%>%
  dyLegend(labelsSeparateLines=TRUE) %>%
  dyRangeSelector() %>%
  dyAxis("x", label = paste("Numebr of days since", common_time[1]))
```

Row {data-height=300} 
-----------------------------------------------------------------------





### Daily confirmed cases of COVID19 in Chicago

```{r}

# if (exists("chicago")) is.data.frame(get("chicago")) else chicago <- RSocrata::read.socrata(
#   "https://data.cityofchicago.org/resource/ijzp-q8t2.csv?$where=year >= 2014",
#   app_token = "hPU78MH7zKApdpUv4OVCInPOQ")
# 
# 
# # add date
# chicago <- chicago %>%
#   mutate(Date = substr(date, start = 1, stop = 10)) %>%
#   mutate(y_month  = substr(date, start = 1, stop = 7)) %>%
#   mutate(month = substr(date, start = 6, stop = 7))
# 
# # summary of all crime
# chicago_summary <- chicago %>%
#   group_by(primary_type) %>%
#   summarise(number_of_crime = n()) %>%
#   arrange(desc(number_of_crime))
# 
# # looks life theft is seeing sharp drop
# 
# # year to year comparison
# plt <- chicago %>%
#   dplyr::select(y_month, month, primary_type, year) %>%
#   filter(primary_type %in% chicago_summary$primary_type[1:5]) %>%
#   count(year, month, primary_type) %>%
#   na.omit()%>% ggplot(aes(x=month, y=n, group = year, color = as.character(year))) + geom_line() + facet_free(~primary_type,scales = "free", space = "free")+xlab("Month") +
#            ylab("Cases") + theme(legend.title = element_blank())
# 
# ggplotly(plt)

#TEST CHUNK UNCOMMENT TO REDUCE FILE RUN TIME [Design]
# n <- 20
# x1 <- rnorm(n); x2 <- rnorm(n)
# y1 <- 2 * x1 + rnorm(n)
# y2 <- 3 * x2 + (2 + rnorm(n))
# A <- as.factor(rep(c(1, 2), each = n))
# df <- data.frame(x = c(x1, x2), y = c(y1, y2), A = A)
# fm <- lm(y ~ x + A, data = df)
# 
# p <- ggplot(data = cbind(df, pred = predict(fm)), aes(x = x, y = y, color = A))
# p <- p + geom_point() + geom_line(aes(y = pred))
# ggplotly(p)
dygraph(ts_diff_CH,
         main = "Daily confirmed cases of
         COVID-19 in Chicago")%>%
  dySeries("cd7b965f", label = "Chicago")%>%
  dyLegend(show = "always", hideOnMouseOut = FALSE)
```


### Summary

This is text [link](http://www.example.com)

- one
- two 
- three

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Providence
=======================================================================

Boston
=======================================================================

Philadelphia
=======================================================================

Los Angeles
=======================================================================

Atlanta
=======================================================================

Seattle
=======================================================================